train_dataset.csv already exists - loading from disk
validation_dataset.csv already exists - loading from disk
test_dataset.csv already exists - loading from disk
A Case Study on the BETH Dataset
2025-08-05
Kim et al. 2016: LSTM classifiers for intrusion detection outperform static baselines.
Malhotra et al. 2016: LSTM encoder-decoder reconstructs normal sequences, anomalies detected by high reconstruction error.
Yin et al. 2017: RNNs generalize better on raw network flows.
Cinque et al. 2022: Micro2vec + LSTM captures log anomalies in microservices.
Old datasets (KDD’99, NSL-KDD, ISCX 2012) are synthetic and outdated.
BETH dataset (Highnam et al. 2021)(inproceedings?):
train_dataset.csv already exists - loading from disk
validation_dataset.csv already exists - loading from disk
test_dataset.csv already exists - loading from disk
Forget Gate:
\(f_t = \sigma(W_f [h_{t-1}, x_t] + b_f)\)
Input Gate and Candidate Update:
\(i_t = \sigma(W_i [h_{t-1}, x_t] + b_i)\)
\(\tilde{C}_t = \tanh(W_c [h_{t-1}, x_t] + b_c)\)
Cell State Update:
\(C_t = f_t * C_{t-1} + i_t * \tilde{C}_t\)
Output Gate and Hidden State:
\(o_t = \sigma(W_o [h_{t-1}, x_t] + b_o), \;\; h_t = o_t * \tanh(C_t)\)
Figure 1: A module of LSTM network (Trinh et al. 2021)
Figure 2: LSTM Autoencoder for Anomaly Detection (Trinh et al. 2021)
Anomaly Detection from Reconstruction Error
\[ L = \frac{1}{T} \sum_{t=1}^T \| x_t - \hat{x}_t \|^2 \]
| Feature | Description |
|---|---|
timestamp |
Date and time When the event occurred (float) |
processId |
ID of the process generating the event |
threadId |
ID of the thread performing the operation |
parentProcessId |
ID of the parent process |
userId |
User running the process/event |
mountNamespace |
Kernel namespace for filesystem isolation |
processName |
Name of the executable or program |
hostName |
Name or IP of the machine |
eventId |
Numeric identifier for the event |
eventName |
Name/type of system call/event |
stackAddresses |
List of memory addresses (call stack) |
argsNum |
Number of arguments for the event |
returnValue |
Return value of the system call/event |
args |
List of arguments (name, type, value) |
sus |
1 if flagged suspicious, 0 otherwise |
evil |
1 if event is malicious, 0 otherwise |
Feature Selection
Encoding:
One-hot encoding for categorical features
Sequence Generation: sliding window and Reshaped logs into 3D tensors
Sequence Labeling:
Normal: Only benign events
Anomalous: At least one suspicious (Sus) or malicious (Evil) event
Trained on normal sequences only
Symmetric autoencoder: stacked LSTM + bottleneck
Optimized via Adam, minimizing reconstruction loss
Generalization: early stopping, dropout, 5-fold CV
Hyperparameters tuned based on prior work (Nguyen et al. 2021; Malhotra et al. 2016).
Threshold selection via:
Evaluation metrics:
X_train shape: (763137, 8, 9)
X_val shape: (188960, 8, 9)
X_test shape: (188960, 8, 9)
y_test distribution: [ 30528 158432]
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ input_layer (InputLayer) │ (None, 8, 9) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ lstm (LSTM) │ (None, 8, 128) │ 70,656 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout (Dropout) │ (None, 8, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ lstm_1 (LSTM) │ (None, 64) │ 49,408 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_1 (Dropout) │ (None, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ repeat_vector (RepeatVector) │ (None, 8, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ lstm_2 (LSTM) │ (None, 8, 64) │ 33,024 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_2 (Dropout) │ (None, 8, 64) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ lstm_3 (LSTM) │ (None, 8, 128) │ 98,816 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dropout_3 (Dropout) │ (None, 8, 128) │ 0 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ time_distributed │ (None, 8, 9) │ 1,161 │ │ (TimeDistributed) │ │ │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 253,065 (988.54 KB)
Trainable params: 253,065 (988.54 KB)
Non-trainable params: 0 (0.00 B)
5-fold cross-validation:
Stable convergence.
Figure 7: Training / validation Loss result
Distribution of reconstruction error separates normal vs. anomalous.
Figure 8: Histogram of test reconstruction errors (MAE)
Precision: 99.3%, Recall: 99.5%, F1-score: 99.4%.
Figure 9: Precision Recall, F1 vs threshold plot
Figure 10: Confusion matrix for final predictions
ROC AUC = 99.5%.
Figure 11: ROC Cuve
Key Results
- LSTM autoencoder successfully learned normal process behavior from the BETH dataset.
Achieved 99% accuracy and F1-score of 0.95 in detecting malicious sequences.
High recall → Most attacks detected (low false negatives).
Low false positive rate (3.4%) → Suitable for real-world deployment.
Implications
- Detects novel/zero-day attacks without prior knowledge.
Fits real-time monitoring in dynamic/cloud environments.
Future work: add attention, ensemble methods, and retraining to improve adaptability.